feat(voice-server + installer): Google Cloud TTS + cross-platform audio by fayerman-source · Pull Request #872 · danielmiessler/Personal_AI_Infrastructure

fayerman-source · 2026-03-02T19:20:10Z

Summary

Google Cloud TTS as alternative provider alongside ElevenLabs — configurable via settings.json → daidentity.ttsProvider
Cross-platform audio playback — detects afplay (macOS), mpv/ffplay/aplay (Linux/WSL2) at startup
Cross-platform desktop notifications — osascript (macOS), notify-send (Linux) with silent fallback
Backwards compatible — defaults to ElevenLabs when ttsProvider is not set

Why

ElevenLabs free tier is 10K chars/month; Google Cloud free tier is 4M chars/month (Standard) or 1M (WaveNet/Neural2)
Voice server used macOS-only afplay and osascript, making it non-functional on Linux/WSL2
No new dependencies — uses Google's REST API directly via fetch

Configuration

Add to ~/.env:

GOOGLE_CLOUD_API_KEY=your_key_here
# or reuse existing: GOOGLE_API_KEY=your_key_here (both accepted)

Add to ~/.claude/settings.json:

{
  "daidentity": {
    "ttsProvider": "google-cloud",
    "googleCloudVoice": {
      "languageCode": "en-US",
      "voiceName": "en-US-Neural2-D",
      "voiceType": "NEURAL2",
      "speakingRate": 1.0,
      "pitch": 0.0
    }
  }
}

Or keep using ElevenLabs by not setting ttsProvider (or setting it to "elevenlabs").

Test plan

Voice server starts with ttsProvider: "google-cloud" and logs correct provider
Health endpoint shows voice_system: "google-cloud", google_cloud_configured: true, audio_player
TTS generates and plays audio end-to-end on Linux/WSL2 via mpv
Backwards compatible — omitting ttsProvider defaults to ElevenLabs
Verify on macOS with afplay

Context

Re-implementation of #687 (closed during v4.0 restructuring) with additional Linux/WSL2 cross-platform support. Tested live on WSL2 with Google Cloud Neural2-D voice.

🤖 Generated with Claude Code

Adds Google Cloud Text-to-Speech as alternative TTS provider and fixes audio playback on Linux (WSL2). - Google Cloud TTS via REST API (no SDK), configurable in settings.json - Cross-platform audio: afplay (macOS), mpv/ffplay/aplay (Linux) - Cross-platform notifications: osascript (macOS), notify-send (Linux) - Backwards compatible: defaults to ElevenLabs when ttsProvider unset - Accepts GOOGLE_CLOUD_API_KEY or GOOGLE_API_KEY from ~/.env Re-implementation of danielmiessler#687 with additional Linux/WSL2 support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Expands the installer's Step 7 (Voice Setup) to support multiple TTS providers instead of hardcoding ElevenLabs: - New provider selection prompt: ElevenLabs / Google Cloud TTS / Skip - Google Cloud TTS path: key search, validation, Neural2-D default - ElevenLabs path: unchanged existing flow - Settings.json gets ttsProvider + googleCloudVoice when Google selected - .env saves the correct key for chosen provider - Key validation for Google Cloud via texttospeech.googleapis.com - Updated types, config-gen, detect, steps descriptions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ports the Linux service installation from danielmiessler#686 to v4.0.3: - Platform detection at startup (Darwin/Linux) - Linux: systemd user service instead of LaunchAgent - Linux: checks for audio player (mpv/ffplay/aplay) and notify-send - Detects both ElevenLabs and Google Cloud API keys - Menu bar indicator prompt only on macOS - Removed macOS-specific "say" fallback references Re-implementation of danielmiessler#686 install.sh changes for current architecture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When ELEVENLABS_API_KEY is missing or the ElevenLabs API call fails, the VoiceServer now falls back to macOS native `say` command instead of silently skipping voice output. Pronunciation rules from pronunciations.json are applied to the fallback too. - Only triggers when ElevenLabs path didn't play (no double-speak) - Reuses existing spawnSafe() and applyPronunciations() helpers - Fails gracefully — logs error, doesn't crash server - Uses error: unknown with instanceof type guard - TODO: Linux equivalent (see danielmiessler#855, danielmiessler#872) Co-Authored-By: Claude <noreply@anthropic.com>

…ut device New `voice.requireHeadphones` setting in settings.json (default: false). When enabled, VoiceServer checks the default audio output device via `system_profiler SPAudioDataType -json` and skips voice playback if the output is built-in laptop speakers. Voice plays normally through Bluetooth, USB, HDMI, AirPlay, and other external audio devices. - Uses `-json` flag for reliable machine-parseable output - Caches detection result for 30 seconds (system_profiler takes 140-250ms) - 3-second timeout prevents hangs if system_profiler stalls - Fails open — if detection fails, voice plays anyway (convenience, not security) - Desktop notification banners display regardless of headphone state - Config uses `=== true` (opt-in, missing key defaults to OFF) - TODO: Linux equivalent (see danielmiessler#855, danielmiessler#872) References danielmiessler#855 Co-Authored-By: Claude <noreply@anthropic.com>

Merged origin/main into feat/google-cloud-tts-v2. Single conflict in actions.ts imports — kept both PAI_VERSION/ALGORITHM_VERSION from main and validateGoogleCloudKey from this branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fayerman-source and others added 2 commits March 2, 2026 14:19

fayerman-source changed the title ~~feat(voice-server): Add Google Cloud TTS + cross-platform audio playback~~ feat(voice-server + installer): Google Cloud TTS + cross-platform audio Mar 2, 2026

blu3dot mentioned this pull request Mar 3, 2026

feat(voice): fall back to macOS say when ElevenLabs unavailable #897

Closed

4 tasks

blu3dot mentioned this pull request Mar 3, 2026

feat(voice): add requireHeadphones config to gate voice on audio output device #898

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(voice-server + installer): Google Cloud TTS + cross-platform audio#872

feat(voice-server + installer): Google Cloud TTS + cross-platform audio#872
fayerman-source wants to merge 4 commits intodanielmiessler:mainfrom
fayerman-source:feat/google-cloud-tts-v2

fayerman-source commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

fayerman-source commented Mar 2, 2026

Summary

Why

Configuration

Test plan

Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant